ENH: improve performance of read_dataframe if a filter is used #577

theroggy · 2025-09-13T01:10:50Z

In read_dataframe without arrow, the number of rows of the result was counted first, and then the full data was read.

Especially when using a filter, counting the rows can take significant time. If the filter limits the rows a lot counting the rows can even take the same time as the subsequent reading of all data.

This PR removes the rowcount before reading to improve performance.

theroggy added 4 commits September 8, 2025 09:02

ENH: improve performance of read_dataframe

f73eae3

ENH: improve performance of read_dataframe if a filter is used

ec645b5

Update CHANGES.md

1fad77a

Update _io.pyx

af294f2

theroggy marked this pull request as ready for review September 13, 2025 15:49

theroggy marked this pull request as draft September 13, 2025 15:50

theroggy added 3 commits September 13, 2025 20:17

Try to fix for pandas 3

d3fcf15

Update test_geopandas_io.py

8e1328a

Update test_geopandas_io.py

6b816c3

theroggy marked this pull request as ready for review September 13, 2025 20:30

theroggy modified the milestones: 0.11.0, 0.12.0 Sep 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: improve performance of read_dataframe if a filter is used #577

ENH: improve performance of read_dataframe if a filter is used #577

Uh oh!

theroggy commented Sep 13, 2025

Uh oh!

Uh oh!

Uh oh!

ENH: improve performance of read_dataframe if a filter is used #577

Are you sure you want to change the base?

ENH: improve performance of read_dataframe if a filter is used #577

Uh oh!

Conversation

theroggy commented Sep 13, 2025

Uh oh!

Uh oh!